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Game theory is the standard tool used to model strategic interactions in evolutionary biology and 
social science. Traditional game theory studies the equilibria of simple games. But is traditional 
game theory applicable if the game is complicated, and if not, what is? We investigate this question 
here, defining a complicated game as one with many possible moves, and therefore many possible 
payoffs conditional on those moves. We investigate two-person games in which the players learn 
based on experience. By generating games at random we show that under some circumstances the 
strategies of the two players converge to fixed points, but under others they follow limit cycles 
or chaotic attractors. The dimension of the chaotic attractors can be very high, implying that 
the dynamics of the strategies are effectively random. In the chaotic regime the payoffs fluctuate 
intermittently, showing bursts of rapid change punctuated by periods of quiescence, similar to what 
is observed in fluid turbulence and financial markets. Our results suggest that such intermittency 
is a highly generic phenomenon, and that there is a large parameter regime for which complicated 
strategic interactions generate inherently unpredictable behavior that is best described in the 
language of dynamical systems theory. 



Traditional game theory usually gives a good under- 
standing for simple games with a few players, or with 
only a few possible moves, characterizing the solutions 
in terms of their equilibria [TJ [2]. The applicability of 
this approach is not clear when the game becomes more 
complicated, for example due to more players or a larger 
strategy space, which can cause an explosion in the num- 
ber of possible equilibria [3HS]- This is further compli- 
cated if the players are not rational and must learn their 
strategies [Tllllj. In a few special cases it has been ob- 
served that the strategies display complex dynamics and 
fail to converge to equilibrium solutions . Are such 

games special, or is this typical behavior? More gener- 
ally, under what circumstances should we expect that 
games become so hard to learn that their dynamics fail 
to converge? What kind of behavior should we expect 
and how should we characterize the solutions? 

As an example of what we mean compare the games 
of tic-tac-toe and chess. Tic-tac-toe is a simple game 
with only 765 possible positions and 26, 830 distinct se- 
quences of moves. Young children easily discover the 
Nash equilibrium, which results in a draw, at which point 
the game becomes uninteresting. In contrast, chess is a 
complicated game with roughly 10 47 possible positions 
and 10 123 possible sequences of moves; despite a huge 
effort, the Nash equilibrium (corresponding to an ideal 
game) remains unknown. Equilibrium concepts of game 
theory are not useful in describing complicated games 
such as chess or go (which has an even larger game tree 
with roughly 10 360 possible sequences of moves). An ex- 
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ample that is even closer to what we have in mind here 
is investing in financial markets, which is a non-zero sum 
game where players can choose between thousands of as- 
sets and a rich set of possible strategies. 

Here we show that if the players use a standard ap- 
proach to learning, for complicated games there is a large 
parameter regime in which one should expect complex 
dynamics. By this we mean that the players never con- 
verge to a fixed strategy. Instead their strategies contin- 
ually vary as each player responds to past conditions and 
attempts to do better than the other players. The tra- 
jectories in the strategy space display high-dimensional 
chaos, suggesting that for most intents and purposes the 
behavior is essentially random, and the future evolution 
is inherently unpredictable. 



I. MODEL 

To address the questions raised above we study two- 
player games. For convenience call the two players 
Alice and Bob. At each time step t player fi E 
{Alice = A, Bob = B} chooses between one of iV possi- 
ble moves, picking the i th move with frequency xf(t), 
where i = 1,...,N. The frequency vector x^(t) = 
(xi, . . . ,Xff) is the strategy of player fi. If Alice plays 
% and Bob plays j, Alice receives receives payoff and 
Bob receives payoff . 

We assume that the players learn their strategies x M 
via a form of reinforcement learning called experience 
weighted attraction. This has been extensively studied by 
experimental economists who have shown that it provides 
a reasonable approximation for how real people learn in 
games [7HS]. Actions that have proved to be successful 
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FIG. 1: An illustration of complex learning dynamics, depicted in terms of trajectories of the strategy x A '(f) for different 
parameters (ft — 0.01). In (a) the attractor is a limit cycle, whereas (b-d) are chaotic attractors of increasingly high dimension. 
There are N — 50 possible moves; the upper panel shows an arbitrary three-dimensional projection of each attractor in the 
98-dimensional phase space, and lower panels show the strategy as a function of time for the corresponding three coordinates. 
For clarity we use logarithmic scale. As the dimension of the attractor increases, so does the range of x%. For the highest 
dimensional case a given move has occasional bursts where it is highly probable, and long periods where it is extremely 
improbable (as low as 10~ 72 ). 



in the past are played more frequently and moves that 
have been less successful are played less frequently. To 
be more specific, the probability of a given move is 

where Q% is called the attraction for player i to strategy 
fi. In the special case of experience weighted attraction 
that we use here, Alice's attractions are updated accord- 
ing to 

Qf(t + 1) = (1 - a)Qf{t) + £ ll;V< (2) 

3 

and similarly for Bob with A and B interchanged. 

The dynamics for updating the strategies x' 1 of the two 
players are completely deterministic. This approximates 
the situation in which the players vary their strategies 
slowly in comparison to the timescale on which they play 
the game. 

The key parameters that characterize the learning 
strategy are a and /?. The parameter j3 is called the 
intensity of choice; when j3 is large a small historical ad- 
vantage for a given move causes that move to be very 
probable, and when j3 — all moves are equally likely. 
The parameter a specifies the memory in the learning; 
when a — 1 there is no memory of previous learning steps, 
and when a = all learning steps are remembered and 
are given equal weight, regardless of how far in the past. 



The case a = corresponds to the much-studied replica- 
tor dynamics used to describe evolutionary processes in 
population biology [TljrfrT] . 

We choose games at random by drawing the elements 
of the payoff matrices 11^- from a normal distribution [3j- 
H3 [TS] . The mean and the covariance are chosen so that 
E[%] = 0, E^f] = 1/N, and E\n$H$\ = T/N, 
where E[x] denotes the average of x. The variable T is a 
crucial parameter which measures the deviation from a 
zero-sum game. When T = — 1 the game is zero sum, i.e. 
the amount Alice wins is equal to the amount Bob loses, 
whereas when V — their payoffs are uncorrelated. 



II. RESULTS 

We simulate randomly constructed games with N = 50 
possible moves, corresponding to a 98 dimensional state 
space (there are two 50 dimensional strategy vectors and 
two probability constraints). The behavior observed de- 
pends on the parameters. In some cases we see stable 
learning dynamics, in which the strategies x^ 1 of both 
players converge on a fixed point. For a large section of 
the parameter space, however, the strategies converge to 
a more complicated orbit, either a limit cycle or a chaotic 
attractor. We characterize the local stability properties 
of the attractors by numerically computing the Lyapunov 
exponents Aj, i = 1, . . . , 2N — 2, which quantify the rate 
of expansion or contraction of nearby points in the state 
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space. The Lyapunov exponents also determine the Lya- 
punov dimension D, which measures the number of de- 
grees of freedom of the motion on the attractor. 

We give several examples of the observed learning dy- 
namics at different parameter values in Fig. [T] These 
include a limit cycle and chaotic attractors of varying di- 
mensionality. There can also be long transients in which 
the trajectory follows a complicated orbit for a long time 
and then suddenly collapses into a fixed point. In general 
the behavior observed depends on the random draws of 
the payoff matrices Ily, but as we move away from the 
stability boundary, for a given set of parameters we ob- 
serve fairly consistent behavior . 

Simulating games at many different parameter values 
reveals the stability diagram given in Fig. [2j Roughly 
speaking we find that the dynamics are stable [TH] when 
r « — 1 (zero sum games) and a is large (short memory), 
i.e. in the lower right of the diagram, and unstable when 
r sa (uncorrelated payoffs) and a is small (long mem- 
ory), i.e. in the upper left. Interestingly, for reasons that 
we do not understand the highest dimensional behavior is 
observed when the payoffs are moderately anti-correlated 
(r ~ —0.6) and when players have long-memory (a ~ 0). 

In order to make the problem analytically tractable 
we have made specific choices in the parameters for ex- 
perience weighted attraction (EWA). Comparison with 
behavioral experiments modeled with EWA as reported 
in [7] shows that the particular form we are using here 
is roughly within the range observed in real experi- 
ments. Values for memory-loss parameters and intensity 
of choice reported from experiments suggest that real- 
world decision making may well operate near or in the 
chaotic phase (see Supplementary Information). Most 
experimental data is limited to low-dimensional games 
however, whereas here we study games with a large num- 
ber of possible moves. A good example where high di- 
mensional chaotic behavior is likely is in financial mar- 
kets, where there are a huge number of possible moves 
and learning times are measured in years. High dimen- 
sional chaotic behavior can be effectively indistinguish- 
able from noise. 

A good approximation of the boundary between the 
stable and unstable regions of the parameter space can be 
computed analytically using techniques from statistical 
physics. We use path-integral methods from the theory 
of disordered systems [5J [5^ to compute the stability in 
the limit of infinite payoff matrices, N — > oo. We do this 
in a continuous-time limit where, for fixed T, stability 
then depends only on the ratio a//3 (see Supplementary 
Information) . 

We have simulated games for various values of N. If 
D > at small N, the dimension D tends to increase with 
N. At this stage we have been unable to tell whether D 
reaches a finite limit or grows without bound as N — » oo. 

An interesting property of this system is the time de- 
pendence of the received payoffs. As shown in Fig. [3j 
when the dynamics are chaotic the total payoff to all the 
players varies, with intermittent bursts of large fluctua- 
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FIG. 2: Stability diagram showing where stable vs. chaotic 
learning is likely (/3 = 0.01). The solid line is the stabil- 
ity boundary estimated using path-integral methods. The 
coloured squares are from simulations of the learning dynam- 
ics and represent the typical dimension of the attractor (av- 
eraged over 10 or more independent payoff matrices per data 
point). 
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FIG. 3: Chaotic dynamics display clustered volatility. We plot 
the difference of payoffs on successive time steps for case (c) 
in Fig. [I] The amplitude of the fluctuations increases with 
the dimension of the attractor. 



tions punctuated by relative quiescence. This is observed, 
although to varying degrees, throughout the chaotic part 
of the parameter space. There is a strong resemblance 
to the clustered volatility observed in financial markets, 
which in turn resembles the intermittency of fluid turbu- 
lence [T21 12T] . We also observe heavy tails in the distri- 
bution of the fluctuations, as described in more detail in 
the Supplementary Information. This suggests that these 
properties, which have received a great deal of attention 
in studies of financial markets, may occur simply because 
they are generic properties of complicated games |22| . 



4 



III. WHY IS DIMENSIONALITY RELEVANT? 

The dimensionality D is relevant to this problem be- 
cause high dimensionality suggests that failure to con- 
verge to a fixed point is independent of the learning al- 
gorithm, i.e. the game is intrinsically hard to learn. The 
fact that the equilibria of a game are unlearnable with 
any particular learning algorithm, such as reinforcement 
learning, does not imply that learning is not possible 
with some other learning algorithm. For example, if the 
learning dynamics settles into a limit cycle or a low di- 
mensional attractor, a careful observer could collect data 
and make better predictions about the other player using 
the method of analogues [23 , or refinements based on lo- 
cal approximation [24 . If the dimension of the chaotic 
attractor is too high, however, the curse of dimension- 
ality makes this impossible with any reasonable amount 
of data [53]. This suggests that there exists no learning 
algorithm that can provide an improvement when learn- 
ing must occur inductively based on past data. The ob- 
servation of high-dimensional dynamics here leads us to 
conjecture that there are some games that are inherently 
unlearnable, in the sense that any learning algorithms 
will inevitably result in high-dimensional chaotic learn- 
ing dynamics (See also Sato et al. [H])- 

Our work here makes it possible to predict a priori the 
qualitative properties of the learning dynamics of any 
given complicated two player game under reinforcement 
learning. This is because the payoff matrix of any given 
game is a possible draw from an ensemble of random 



games. One can make a good estimate of the stability 
properties of the learning dynamics by locating the game 
and the learning parameters in the stability diagram of 
Fig. [2j We have shown that a key property of a game 
is its "zero-sumness" , characterized by V. Games be- 
come harder to learn (in the sense that the strategies do 
not converge) when they are non-zero-sum, particularly 
if the players use learning algorithms with long mem- 
ory. This analysis can potentially be extended to mul- 
tiplayer games, games on networks, alternative learning 
algorithms, etc. 

Our results suggest that under many circumstances it 
is more useful to abandon the tools of classic game theory 
in favor of those of dynamical systems. It also suggests 
that many behaviors that have attracted considerable in- 
terest, such as clustered volatility in financial markets, 
may simply be specific examples of a highly generic phe- 
nomenon, and should be expected to occur in a wide 
variety of different situations. 
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Appendix A: Experience weighted attraction learning 
1. General definitions for multi-player games 

We here briefly describe the experience weighted attraction learning (EWA) model put forward in [SJ !) 2"> . Consider 
a game played by p players, who each choose from a set of N actions (pure strategies) at each time step [40]. In the 
EWA model the probability for player /i 6 {1, . . . ,p\ to choose action i e {1, . . . , N} at time t is 

where the {Qf } are referred to as attractions or propensities |41j. The basic idea is that Q.f (£) gives the "attraction" 
of player fi to action i at time t, based on how successful strategy i has been in the past. The model parameter /3 > 
is called the intensity of choice. For /3 — players pick actions with equal probability (i.e. they play completely at 
random), and for /? — >■ oo each player's choice is deterministic, i.e. that player will always choose the same action for 
given values of Qf , namely the action with the highest attraction. 

The update rule for the attractions {Q^} in the EWA model reads [81 l§I |2"S"] 

CT , 1} = <t>W)Q$(t) + [* + (!- W, SjMW%^ (t)) ^ 

where the quantity Af(t) is updated according to 

Af(t + 1) = 0(1 - n)Af(t) + 1, (A3) 



see (8J. The notation is explained below: 
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• We write s^t) G {1, . . . , N} for the action player fi takes at time step t in a given realization of the dynamics. 
The notation —ji labels all players other than fi, i.e. — fi is the set {1, . . . ,p}\{fi}. The notation s_ M (i) indicates 
the set of actions that the opponents of player /i took in a given round. Thus 6 {1, . . . , 7V} P_1 is a 
(p — l)-component vector, each component of which is one of the possible actions. 

• For a given S- M (£) <E {1, . . . , N}^ 1 the quantity s_„(t)) is a payoff matrix element, and indicates the 
payoff player /x receives when playing pure strategy i and facing the actions s_^(i) of the other agents at time t. 

• The variable Af indicates a weight factor. An initial condition needs to be specified, which is then updated 
according to relation (A3). When 0(1 — «) = !, this is just the number of times the game has been played. In 



this case, since Af cancels in the first term in the numerator on the RHS of Eq. (A2), and since it divides the 
second term, as time goes on the influence of the updates becomes smaller and smaller, i.e. past moves have 
more weight than recent moves and the behavior becomes "set" . 

• The notation /(•, •) stands for the indicator function (also called the Kronecker delta), i.e. I(a, b) = 1 if a = b 
and I(a, b) — otherwise. 

• The parameter S specifies the relative weighting given to strategies that are played vs. those that are not played. 
In the case of 6 = 1 players update all attractions in every round, irrespective of what actions they actually 
took. The choice 5 — corresponds to a case where only the scores of strategies that are actually used in a 
given round are updated after that round. 

• The parameter k interpolates between average reinforcement learning (k = 0) and cumulative reinforcement 
learning (k = 1), see [HI [HI US]; we have Af(t) — 1 for all t if K = 1, the attractions then represent the 
cumulative outcome of all past play (depending on the choice of potentially discounted over time), for k = 
the normalisation factor Af(t) grows with time. 

• The parameter specifies the weight of outcomes of play in the distant past relative to more recent iterations. 
If « = 1 and = 1 all past experience carries equal weight, no matter how much time has elapsed, for = only 
the most recent round affects the players' future decisions. Intermediate values of correspond to exponential 
discounting. 

2. The specific case that we study here 

There are many parameters within the formalism of experience weighted attraction, and it is beyond the scope 
of this paper to investigate all of the possible cases. We thus restrict ourselves to a particular case that is both 
analytically tractable and reasonably close to how real people play. 



We first assume that Eq. (A3) reaches a fixed point Af* in the long-run. Letting Af(t + 1) = Af(t) = Af* gives 



N " " T=k=*> ,A4) 

The update rule then simplifies to 

Qt{t + 1) = 0Qf (i) + (1 - 0(1 - k))[6 + (1 - S)I(i, fl M (t)]n"(i, 8- M (t)). (A5) 

We will focus here on the case 6=1, i.e. all strategy scores are updated in every iteration. Then we have 

QZ(t + 1) = 0Qf (i) + (1 - 0(1 - ft))Ih"(z, s _ M (t)). (A6) 

Focusing on cumulative re-inforcement learning [51 [HI I25j . i.e. the case k = 1, and replacing —¥ 1 — a for later 
convenience, we have [42] 

Of (t + 1) = (1 - a)Q?(t) + IP(i, s_ p (t)). (A7) 



Eq. (A7) is the learning rule used in [TU [15]. The parameter a describes memory loss. For a = past payoffs are 
not discounted, and the memory of players covers the full history of play. For < a < 1 past payoffs are taken into 
account with exponentially decreasing weights. 



To summarise, the learning model we investigate is defined by Eq. ( Al ) together with Eq. ( A7) 
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Eq. ( Al ) specifies how a given player /i€ {1, . . . ,p} translates his or her set of attractions , i G {1, . . . , N}, 
into a mixed strategy (x^, . . . , x 1 ^). He or she will choose action /j, G {1, . . . , N} with the probabilities defined 
by Eq. (ATI. 



• Eq. ( A7 ) specifies how the attractions are updated from time step t to t + 1 once all players have chosen their 
actions in time step t. 

The correspondence with the EWA model of [8] can be summarized as follows: 

model of Camerer et al. ■<-> notation in present work 



P' 



•o- 



1 - a. 



(A8) 



3. Relation to experimental data 

Parameters of the EWA learning model were ht to real- world data in [25], see in particular Table 4. This table 
shows that there is substantial variation in the parameters that provide a best fit to the data across different games. 
While we have chosen parameters that were tractable for the theoretical calculations that follow, comparison to their 
experimental results indicates that these values are fairly reasonable. For example, they find values of the parameter 
k in the range 0.15-0.99; we fix n = 1. The model parameter S obtained from experimental data varies from S = to 
5 = 0.94, suggesting that there is no clear conclusion on whether or not players use forgone payoffs in their adaptation; 
we fix S = I, i.e. the propensities of all strategies are updated at every step. 

The most interesting model parameters from the point of view of our analysis are the memory-loss parameter 
a and the intensity of choice j3. For a fixed game, these parameters largely determine whether or not one should 
expect convergence or chaotic motion. More precisely the ratio a/ (3 is the crucial indicator for the onset of chaos, as 
explained above. Pooled data from [25 suggests a ratio of a//3 « 0.03 (but again with considerable variation across 
games). Depending on the character of the game (zero-sum or not) this can position such experiments inside the 
chaotic phase, see Fig. [5j It is important to keep in mind, however, that the games used in the experiments of [25] are 
low-dimensional, in the sense that each player has the choice only between a small number of moves. Care needs to 
be taken when extrapolating results for high-dimensional random games to these cases. Nonetheless, if one assumes 
that parameters would not dramatically change in moving from simple games to complicated games, then the data 
and model fitting of [8 , taken together with our results, suggests that real- world learning in non-zero sum games may 
well operate in or near the chaotic phase. 



4. Adiabatic limit and deterministic learning 



The update of Eq. (A7) is intrinsically stochastic, as the (p — I)-component action vector s_ /i (i) at time t is drawn 
according to the mixed strategy profiles of the p — 1 opponents player (i is facing. More precisely, player [i will face 
a specific realization of the actions s_ M = (s 1; . . . , s^-i, s M +i, . . . , s p ) € {I, . . . , A} p_1 of all the other players with 
probability 

In order to simplify the problem we follow [THUS] and consider an adiabatic limit of this process. This corresponds 
to averaging over batches of a large (infinite) number of rounds between two adaptation steps i.e. to the replacement 

n"(i )S _ M (t)) — ^n^M^K-^t). (Aio) 



in Eq. ( A7). The sum over s_ M here runs over all elements of {1, ... , N} p 1 . We have introduced the notation 11^ (t) 
to indicate that the right-hand-side is the mean of the left-hand-side, i.e. Ilf (t) is the expected payoff for player /i if 
she chooses to play action i and given her opponents' mixed strategy profiles at time t. In this sense the adiabatic 
limit can be understood as describing the dynamics on expectation. Fluctuation effects induced by the stochastic 
choice of pure actions by the players are here neglected, see however [27-29 for systematic studies of noisy learning 



7 



in simple games. 

The equation for updating the attractions becomes 

Qt(t + l) = (l-a)Qt(t)+W(t)- (AH) 



Taking into account Eq. ( Al I the learning process can then be described by the following deterministic map 



a£(t + l) = -^ _ . (A12) 

Here, each player chooses between N actions, i = 1, . . . , N, and there are p players, so we have p x N varia bles {xf } 
in total. These variables satisfy the constraints x?(i) — 1 at all times t for all /j, = 1, . . . ,p. Eq. ( A12 1 therefore 
defines a map in a p x (N — l)-dimensional phase space. 

Appendix B: Details of the two-player learning model 

1. Definition of the dynamics 

While the previous sections described learning in a general p-player game, we will now restrict the further discussion 
to the case p — 2, i.e. to two-player games. The two players are Alice (A) and Bob (B). Each of them has N strategies 



to choose from. Eqs. (A7) then read 



Qf(t+1) = (l-a)Qf(t)+U A (t, SB (t)), 

Qf(< + 1) = (l-a)Qf(<)+n s (z, SA (<)). (Bl) 

Simplication of notation: 

We will write Xi(t) for the probability with which Alice uses action i at time t, and similarly yi(t) is the probability 
with which Bob plays action i at that time|43j. For simplicity we will change the notation by letting a^- be the payoff 
Alice receives when she plays action i and when Bob plays action j. The payoff for Bob in this situation will be bji. 
The two N x N matrices (a^ ) and (b%j), with i,j £ {1, . . . , N} then define an asymmetric two-player game, in which 
each player has N pure strategies to choose from. Taking the deterministic limit, as described above, the update rules 
for the attractions now read 

Q?(t + 1) = (l-a)Qf(i) + 

j 

Qf(t + 1) = (l-a)Q?(t)+J2b ijXj (t), (B2) 



and the map of strategy updates is given by 



2. Relation between discrete-time dynamics and continuous-time Sato-Crutchfield equations 

Chaotic motion in learning dynamics of the above type has previously been reported for relatively low-dimensional 
games in [TH[TS]. These studies were carried out for continuous-time processes, and it is therefore useful to elaborate 



on the relation of the discrete-time map defined by Eq. (B3) and the continuous-time dynamics of Sato et al|44|. 



The discrete-time dynamics of Eq. ( B3 ) can be written as 



Xi(t + i) 



z x {t) 



*(. + !) - '^ L ^ W) ■ (B4) 



where we define the normalisation factors 

Z x (t) = J2xk(t) 1 ~ a ^^ akm(t \ Z y (t) =Y J Vk{t) 1 ~ a e^ b » x i { - t \ 

fc fc 

The continuous-time Sato-Crutchfield dynamics on the other hand is given by 



(B5) 



X j .1 j 



^a^-yj - a'lnxi - Z' x 



Vj = Vj 



x'lnyj -Z' y \ , 



(B6) 



see [HJ[l5] for details. The parameter a' i ndicates memory loss in this continuous dynamics, and it is hence analogous 
to the parameter a in the above map (B4). We will detail the relation between a and a' further below. Similarly, the 



role of Z' x and Z' y in Eqs. (B6) is to enforce the normalisation J2i x i — J2iUi = 1 at all times [45 . These quantities 
can be thought of as Lagrange multipliers, they can be expressed explicitly as 



z 'x = ^2 x i ai J y i " a ' lnl ' L Z 'v = y M hi i x i ~ a ' ln Vl 



(B7) 



Similar to what is the case for a and a' there is a close relation between Z x and Z' x and between Z y and Z' y respectively. 
This will be explained in more detail below. 

Limit of small /3: 

In order to relate the discrete-time update rule to the continuous-time Sato-Crutchfield dynamics we consider the 

(B8) 



limit j3 <C 1 in Eq. ( B4 ) . One first writes 

\n Xi (t + l) = (1 -a) \nxi(t) + p^2a ijyj (t) - \nZ x (t) 7 



and similarly for the second equation of (B4|. This is valid for all /3, and can be re-arranged to give 

h\Xi{t + 1) — lnxi(t) a 



(B9) 



where Z' x (t) = h\Z x (t)/ ft. In the limit /3 — > 0, fixing the ratio a//3 during the limiting procedure, and upon appropriate 
re-scaling of time, this turns into 



dt 



hxxi(t) = - — \nxi(t) +y^ j aijVj{t) - Z' x (t), 



(BIO) 



±i(t) = Xi(t) ( -^\n Xi (t) +Y^a tj yj(t) - Z' x {t) 



(Bll) 



which is exactly the first equation of the continuous-time dynamics (B6), with the replacement a' = lim^o a//3. A 
similar argument can be made for the dynamics of yi(t). 

We conclude that the small-/? limit of the discrete-time dynamics at memory-loss parameter a leads to the 
continuous-time Sato-Crutchfield dynamics with memory-loss parameter a' = a/ /?, after a re-scaling of time. 

Relation of fixed-points 
For any choice of j3 the fixed points of the equations 

* (,+1 > - — ■ 
„ (t+ „ . ! ""> 1 ^""" 



(B12) 
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fulfill 



In x* = (1 — a) In x* + (3 a^y* — In , 



(B13) 



with a similar equation for y*. Asterisks here indicate quantities evaluated at the fixed point. We have here assumed 
that fixed points lie in the interior of the strategy simplex, i.e. that x* > and y* > for all i. 
These fixed-point conditions can be reduced to 



a ijVj ~ 







in + -z; = o, 



(B14) 



which reproduces the fixed-point condition of the continuous dynamics, see Eqs. (B6). 



Summary: 

(i) Up to a re-scaling of time the small- /3 limit of the discrete-time dynamics at parameters a, (3 corresponds to the 
continuous-time Sato-Crutchfield dynamics at parameter a' = a/j3. 

(ii) For any choice of j3 the fixed points of the map at parameters a, (3 are identical to those of the continuous-time 
dynamics at a' = a//3. 

(iii) Provided a fixed point of the map exists, its components only depend on the ratio a//3. 

In the following we will use the notation r = (3 /a to denote the relevant control parameter of the continuous dynamics. 
Given that a can be viewed as a damping parameter, and (3 as a forcing term, the ratio r = (3/ a plays a role similar 
to that of a Reynolds number in fluid dynamics |26) . 



3. Large random two-player games 



We will now consider the case of large random games. To this end we will follow the standard spin-glass conventions 
[5l \6\ l31"l I34j and focus on payoff matrices with elements drawn from Gaussian distributions. These distributions are 
fully characterized by their first and second moments. Specifically we will choose the payoff matrix elements {a i:) , 6^ } 
such that 



E[atf] 
E[aijbji} 



0, 



r 

N 



(B15) 



for all pairs i,j € {1, . . . , N}. The significance of the parameter T will be explained below. The notation ]£[•••] denotes 
the average over the distribution of payoff matrices. Every single clement of the payoff bi-matrix is a Gaussian random 
variable of mean zero. It is important to stress that while the payoff matrices are drawn at random at the beginning, 
they remain fixed during the time evolution of the dynamics. In the language of spin glass theory |31) they constitute 
the quenched disorder of the problem. The factors of 1/N in Eqs. (B15) indicate that each payoff matrix element 
is of magnitude 1/yN. This scaling with N is standard in spin glass theory, and chosen to ensure a non-trivial 
thermodynamic limit, N — > oo, as explained below. We point out that payoff matrix elements occur in the learning 
process of Eqs. (B4) only in combinations of the type j3oij and f3bij. The choice of scaling of the payoff matrices is 



therefore equivalent to re-scaling the intensity of choice j3 

The parameter T in Eqs. (BI5 1 measures correlations between the payoff matrix elements a 

if r 



T one has 



%3 and bji. 



E [( 0ij - + b 3i ) 2 } = E [{a l3 f + {b^f + 2a ij 6 ii ] = 0, 



For example 



(B16) 



i.e. a,ij = —bji with probability one, corresponding to a zero-sum game. If T = 1 one has 



E[(c 



0. 



(B17) 
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i.e. aij = bji almost surely. For T = the payoffs and bji are uncorrelated. Choices in the interval T E [—1,1] 
interpolate between the extremes. We focus on the regime of anti-correlation, — 1 < V < throughout this paper, as 
we expect this to be more realistic than positively correlated payoffs. 

Again, following the spin-glass conventions, and to make sure the thermodynamic limit is well defined, we will 
re-scale the {xi, y{\ and consider the normalisation Y2i Xi = Y2i yi — N . Each of the variables {xi, yi} is then of order 
N°. 

At finite TV the update rules in discrete time are given by 

Xi(t + 1) = N- l[ ) 6 



J2 k x k (ty-<* e ^i a ^ w 

w i(U-aJEj b i3 Xj{t) 



The above choice of scaling now becomes more transparent. The exponentials contain terms of the form XwLi a ijVj 

and y^,j—i bijXj, which are well defined and of order one in the thermodynamic limit (N — > oo) with the above scaling. 
In continuous time one has 



Xi (t) 



1 \nx i (t) + ^a ij y j (t)-Z' x {t) 
j / 

■- l \n Vj {t)+Y J b ji x i {t)-Z' v {t)\ (B19) 



Vj{t) \ 

as before, but the Lagrange multipliers are now chosen such that J^i x i(t) = Si 26 (*) = AT at all times 



Appendix C: Path-integral analysis 

a. Generating functional description 

We will here describe the technical details of the path-integral analysis of the dynamics. These techniques are 
standard in the theory of disordered systems, see e.g. [51] . and in particular [3H |33j for texbook descriptions and a 
pedagogic review. They have previously been applied to learning in minority game dynamics in [33] . The original 
application to replicator equations is due to Opper and Diederich, see [3 [6] [34] . Other applications of methods from 
disordered systems to large random games include the calculation of the number of Nash equilibria [3[ 0] , and the 
dynamics of random replicator dynamics [351 136] . 

The starting point is the continuous dynamics 

±i(t) 



Xi(t) 

y 3 (t) 



r 1 ]nxj(t) + ajjUjit) - p x (t) + h Xli (t) j , 
= ( -r-Hnyjty+^jMV-PvW + KM J > ( C1 ) 



where we use the more compact notation p x {t) and p y (t) instead of Z' x (t) and Z' y (t). These quantities will be treated 
as Lagrange multipliers enforcing the normalisation Xi(t) = y], yj(t) = N . The fields hi jX (t) and h y ^{t) have been 
introduced to generate response functions, and will be set to zero at the end of the calculation. 
The dynamical generating functional is then given by 

Z[if>,ip] = J £>[x,y],S(equations of inotioiiJe^/^W^W+W'WfwW}. (C2) 
The source fields tjj and ip have been introduced to generate correlation functions, and will eventually be set to zero 



at the end of the calculation. The notation S (equations of motion) indicates that the integral in Eq. ( C2 ) is over 



paths of the dynamics (CI I only, i.e. the delta-functions impose Eqs. (jCll) for all t and i 
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The next step is to write the delta functions in Eq. ( C2 1 in their Fourier representation. We then find 

Xi(t) 



Z[<P,ip] = / £>[x,y,£,y]exp / dt 



i(t)\ 



Xi(t) 



- I -r 1 lnxj(t) + dijUjit) 



Px(t) + h Xt i(t) 



exp / 



w(t) 



- -r 1 In yi(t) + ^2hjXi(t) - p y (t) + hy,i(t) 



expfi^ J dt[xi(t)ipi(t) + Vi{t)<pi(t)] 



(C3) 



Next, we isolated the terms containing the quenched disorder (the randomly chosen payoff matrix) elements. One has 

'±i(t) 



Z[tp,ip] = J D[x,y,x,y]exp ( J dt 



Xi(t) 



Xi(t) 



+ r 1 In Xi (t) + p x (t) - h Xti (t) 



exp \iJ2f dt +r- 1 ]ny i (t) + Py (t) - h yti (t)^ 

exp (^Yl J dt l x *(t)^r(t) + yi(t)ipi(t)]^j 
exp I ~^y^ J dt[x t {t)a i:j y : j{t) +y j {t)b :ji x t {t)\ 



(C4) 



We are now in a position to carry out the average over the Gaussian disorder, and to compute E[Z[-0, <£>]], where 
£[•••] denotes the disorder-average. We have 



E 



exp(-z^ J dt[xi(t)a,ijyj(t) + yj(t)bjiXi(t)] 



1 



I] CX P / ^{^(*)^(Otfj(*)Wi(* / )+^(*)Vi(Oa : i(*)«i(* / ) 



+ Yxi^Xii^y^yjtf) + r^(t)^(t')^(i)^(i')} 
1 



exp 



N / dt dt' [L x {t,t')C y {t,t') + L y (t,t')C x (t,t') +2TK x (t,t')K y (t',t)] 



(C5) 



where we have introduced the short-hands 

C x (t,t') = jrZMW), C y (t,t') = iE i y j (%(t') ) 

K x {t,t') = ^J2iXi(t)xi(t'), K y {t,t') = i^i V&)W)i 

L x {t,t') = jfEiXi(t)xi(t'), L y (t,t') = ^ Ei»i (*)&(*')• 



(C6) 



These quantities are introduced into the generating functional by means of delta-functions in their integral represen- 
tation, e.g. 

1 = J D[C x ^S^C x (t,t')-^J2 x ^ x ^ 

= J D[6 x ,C\exp(iN J dtdt'C x (t,t') (c x (t,t) - N-^Xityxifflyj , (C7) 
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and similarly for the other order parameters in Eq. (C6|. We have chosen the scaling of the conjugate parameter 
C(t, t') such that the overall exponent carries a prefactor N. 

We then find that the disorder-averaged generating functional can be written in the following form 

E[Z[ip,<p}} = J D[C x ,C y ,L x ,L yi K x ,K y ,d x ,C v X x ,L y ,K x ,K y }e^(N[i' + <i> + n + 0(N- 1 )]) 1 (C8) 



where 



* = i dtdt' 



c s (t,oc a (t ) + c v (t,Oc»(t,0 + ^(t.O^»(*»0 + ^»(*,*')*i»(*»* / ) 

+L x (t, t')L x (t, t') + L y (t, t')L y (t, t') 



results from the introduction of the above order parameters. The term 

1 



$ = -- 



dt dt 1 [L x (t,t')C y {t,t') + L y (t,t')C x (t,t') + 2TK x {t,t')K y (t',t)] 



comes from the disorder average, and f2 describes the details of the microscopic time evolution 



= iV-^log J Dix^p^ix^exp^i J 

'ii(t) 



dt ipi(t)xi(t) 



x exp i dt Xi(t) 



Xi(t) 



+ r lnxi(t) + p x {t) - h Xt i(t) 



x exp 



dt dt' 



C x {t,t')xS)x l {t l ) + L x {t,t')x t {t)^{t') + K^t.t^x^x^t 1 ) 
+N- 1 log J D[ Vi , yi]P v %{yi{0)) exp (i J dt <pi(t)vi(t) 



x exp I i dt yi(t) 



m 

Vi(t) 



+ r 1 In yi {t) + Py (t) -hy ti (t) 



xexp(-t / dtdt' C' s (i,Ol«(*)w(0 + i »( i >Ow(*)w(0+^»(*iOVi( t )w(0 



In this expression p x and pj,*o(-) describe the distributions from which initial distributions are drawn. 



(C9) 



(CIO) 



(Cll) 



The next step is to perform the integrals in Eq. (C8| by means of the saddle-point method, valid in the limit 
N — > oo. This amounts to finding the extrema of the term in the exponent. Setting the variation with respect to the 
integration variables C x , K x and L x to zero gives 



1 



1 



iC x (t,t') = -L y (t,t'), iK x (t,t')=TK v (t',t), iL x (t,t') = -C y {t,t'), 



and similarly we obtain 



iC y (t,t') = ~L x (t,t'), iKy(t,t')=TK x (t',t), iL y (t,t') = *C x (t,t') 



(C12) 



(C13) 



from the variation with respect to C y , K y and L y . 

It remains to perform the extremisation with respect to C x , K x , L x , and with respect to the corresponding quantities 
with subscript y. We find 

C x (t,t') = limjv^oiV- 1 ^ {x t {t) Xl (t')) n , C y (t,t') = limN^N- 1 £\ {y l {t)y l {t')) a , 
K x {t, t') = lim^eo iV^ 1 £. {Xi{t)xi^)) n , K y (t, t') = limbec N' 1 ^ n , 

L x (t,t') =lim N ^ oc N- 1 j: z (x i (t)x l (t')) n , L y (t,t')=]im N ^ 00 N- 1 J2 i {y i (t)y i (t'))a, ( C14 ) 
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where the average (. . .) n is to be taken against a measure defined by the exponent of the expression in Eq. (Cll I, see 
e.g. [33l EH [36] for similar calculations. 

Looking back at the definition of the generating functional, Eq. ( C3 ) , one also realises that 



C x {t,t') = 


— lim 


N~ 


_1 E 

i 


K x (t,t') = 


— lim 


N- 


i 


L x (t,t') = 


— lim 

N—foo 


N~ 


"E 



<5 2 E [Z[tp,(p] 



Hi{t)5h x ,i{t') 
<S 2 E »,¥>]] 



and 



Cy(t,t') = 


— lim 


AT 


" X E 


Ky(t,t') = 


— lim 


TV" 


X E 


Ly{t,t') = 


— lim 


N~ 


"E 



Sh x ,i(t)Sh x ,i(t') 

s 2 e m,<p]\ 



tp=ip=h=o 

(f=1p=h=0 



(C15) 



6ipi(t)8<pi(1/) 



5ipi(t)5h y ^(t') 
5hy t i(t)Sh yti (t') 



<£>=l/>=h=0 



(C16) 



Given that Z[if) = 0, tp = 0, h] = 1 for all h due to normalisation we conclude that L x (t, t') = £j,(i, £') = for all t, t! . 

The variables xp and ip have now served their purpose (to generate correlation functions), and we set them to zero. 
We will also assume uniform perturbations hi^ x (t) = h x {t) and hy_j(t) — h y (t) for all i, and that initial conditions are 

chosen from identical distributions for all components Xi and y^ (i.e. P x %(■) does not depend on z, and similarly for 
Py ,()(')• Then we have 



n = log 



x exp 
+ log 
x exp 



D[x, x]p x , o(x(0)) exp ( i j dt x(t) 
1 



x(t) 
x(t) 



+ r 1 lnx(t)+p x (t)-h x (t) 



dt dt' 



C y (t,t')x(t)x(t') + iTG y (t',t)x(t)x(t') 



D[y,y} Pvfi {y{0)exp (i J dt y{t) + r" 1 \ny(t) + p y (t) - h y (t)Y\ 



dt dt' 



-C x (t 7 t')y(t)y(t') + iTG x (t,t') y (t)y(t') 



(C17) 



where we have used the above saddle-point results, and where we have introduced G x (t, t') = —iK x (t, t') and G y (t, t') = 

-iKy(t,t'). 

The resulting term 



'elf 



D[x,x]D[y,y]p xfi (x{0))py iO (y(0)exp J dtx(t)(^^+r 1 \nx(t) + p x (t) - h x (t) 



x exp 



x exp ii dt y(i) 



x exp 



dt dt' -C y (t, t')x{t)x{t') + iTG y (t', t)x(f)x{t') 

f^ + r-Hny(t)+py(t)-h y (t)^ 
dt dt' \c x {t, t')y{t)W) + XG x {t, t')y{t)y{t') 



(C18) 



is recognised as the generating function of the effective dynamics 
x{t) = x{t) 

m = y(t) 



r J dt'Gy(t,lf)x(tf)-r- 1 lxix(t)-p x (t) + ri x (t) + h x (t) 
r / dt'G x (t,t')y{t')-r- 1 \ny(t)- py{t)+ Vy {t) + hy(t) 



(C19) 
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where 



G x (t,t') — , G y (t,t') — ^ shy{t') i 

(*fe(t)Tfo(f)), = (y(t)ytf)). , (v v {t)Vy{t')) t = (x(t)x(t')) t 
{*(*)>, = (y(t)), = 1, 



(C20) 



and where (• • •) denotes an average over realizations of the effective dynamics (C19). This is to be evaluated at 
vanishing perturbation fields h x (t) = h y (t) = 0. It is hence appropriate to consider 



r J dt'G y (t,t')x(t') - r- x lna:(t) - p x (t) + Vx (t) 
T f dt'G x (t,t')y{t') - r-Hnyit) - p v (t) + Vy (t) 



where EH 



±{t) = x(t) 

y(t) = y(t) 



(»fe(t)»fe(f)>. = C v {t,t>) = (y(t)y(f)), , = C*(i,f ) = W^(f)) ( 

<*(*)>♦ = (»(*)>♦ = 1. 



(C21) 



(C22) 



We note that the path-integral analysis up to this point can also be carried out for the discrete dynamics. In this 
case one obtains the following effective process: 



x(t + 1) 



y(t + 1) 



xjt) 1 -" exp (p [T J2 t , G y (t,t')x(t>) + Vx (t)]) 
Z x (t) 

y{tf- a exp Qg [r Eg + *7„(*)]) 

Z v (t) 



(C23) 



with self-consistency relations as in Eq. ( C22 ). Due to causality we have G(t, t') — for t' > 0, both in the continuous- 
time and in the discrete-time case, so the integrals over t' in Eqs. (C21 ) and the sums in Eq. (C23) only extend over 
the range t' < t. 



b. Fixed point analysis 

In the stationary state all two time quantities (e.g. C x (t, t'), G x (t, t')) become functions of time differences only i.e. 
G x (t,t') — G x (t), where r = t — t', and similar for the other two-time observables. Assuming the dynamics reaches 
a fixed point one also has C x (t,t') = const and similarly for C y (t,t'). 

Fixed points of the discrete-time effective dynamics ( C23 ) are given by 



a In a;* 
-a In y* 



VfiXyX* + prf x 
■T(3 Xx y* +Pv* y 



lnZ* x 



0, 
0, 



(C24) 



where we have written Xx = Jq°° dr G x (r) and Xy — fn° ^ T ( T ) ' ^ n as t cr isk as a superscript indicates fixed-point 
quantities as before. From the continuous-time effective process, Eq. (C21), one obtains the equivalent fixed-point 
condition 



0. 
0. 



Due to symmetry we expect \a 



Xy 



-r 1 \nx* +T Xy x* +rf x - p* x 
-r" 1 In y* + T% x y* + V y - P* y 

X, p* x = p* v = p, see also [2111]. We will also write 

\2\ _ / 1„,*\2\ 



(C25) 



(C26) 



Let us write r] x = ^fqz with z a static Gaussian random variable of mean zero and unit variance. Then let x(z) be 
the positive solution, x, of 



1 In x + T X x + y/qz 



0. 



(C27) 
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The order parameters x, q and p are to be determined from the self-consistency relations 



1 



in other words, we have 



(C28) 



1 = 



l_r Dt m 

y/q J.oo dz 



Dz x(z) 2 , 
Dz x(z), 



(C29) 



where Dz 



-^=e z I" 1 . These equations fully determine the statistical properties of the fixed points of the dynamics, 
and can be used to compute quantities such as the distribution of frequencies with which pure actions are played 
(i.e. the shape of the resulting mixed strategy profile), or the entropy of mixed strategies. Theoretical predictions are 
tested against simulations below (see Sec. Dl). 



c. Linear stability analysis 

We will now carry out a linear stability analysis of the effective dynamics in the continuous-time case. We mostly 
follow the approach first proposed in [5J 134] . As a first step we assume the dynamics is perturbed by small noise 
terms, £(t) and ((t): 



r / dt'G y (t,t') X (t') - r-^lnxit) - Px {t) + r] x [t) + £(t) 



x{t) = x(t) 

y(t) = y(t) 

and that we have small perturbation about a fixed point, i.e. 

x(t) = x* + x(t), 
y(t) = y*+y{t), 
Vx(t) = rf x +v{t), 
= V* y +w(t). 

Perturbations are here labelled by hats on the corresponding variables, this is not to be confused with the notation 
x~i,yj etc in earlier sections, where, in the course of computing the generating functional, hats indicated conjugate 
variables. Following [5J|S] we restrict the analysis to cases where x* > and y* > 0. Expanding to linear order in the 
deviations from the fixed point we then have 



(C30) 



(C31) 
(C32) 
(C33) 
(C34) 



d_ 
dt 
d_ 
dt 



x(t) 

m 



-r 1 x(t) + x* 
-r- l y{t)+y* 



r / dt' G y {t - t')x{t') + v(t) + £(t) 
r / dt' G x {t~t')y(t')+w(t)+({t) 



(C35) 



In Fourier space we have 



TGyiuj) 



-rG B (w) 



ILJ + r 

. y 

for which we will introduce the short-hand notation 



x(w) = v(u) + £(u), 
y(u) = w{uj) +C(w), 



A(ui, x*)x(u>) = v(uj) + 
B{u),y*)y{w)=w{w) + Q{u). 



(C36) 



(C37) 
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Denoting the fraction of strategies played with non-zero probability by cf> (not to be confused with the memory-loss 
parameter <p in earlier sections), and taking into account that we are only considering components with x* > 0, y* > 
this gives (for details of similar calculations see [5]) 



(I^HI 2 ), = 0«||/M| 9 ), + i) 
<|yHI 2 ), = 0«|£HI 2 l + i) 



i 



\A(u,x*)\* 
1 



(C38) 



where we have used the self-consistency relations (\v(cu)\ 2 )^ = (|j/(w)| 2 ) + and (\w(w)\ 2 )^ — (\x(oj)\ 2 ^. 

Again following [5] let us now focus on the uj = mode. Using the symmetry between players we have 
(\x(u = 0)| 2 ) + = = 0)| 2 ) + , and hence we find 



<|*(u, = 0)| 2 > t = 



- 1 



(C39) 



This expression diverges, as 



(C40) 



signalling the onset of instability. In particular Eq. ( C39 1 predicts a negative value of (\x(ui = O)! 2 )^, if 



\A(w=o x*)\ 2 I ^ 1' indicating that our self-consistent fixed-point solution breaks down. Eq 
defines the boundary of the stable fixed point phase, and was used to generate the stability diagram in 



(C40|) therefore 
the": 



mam paper 

(Fig. 2). The fraction of active strategies is here given by 0=1, following our solution for fixed points of the effective 
process (we find that Eq. (C27) has positive solutions x(z) for all values of z, provided Y < 0). 



Appendix D: Numerical methods and simulation results 
1. Test of theoretical predictions against simulations 

a. Order parameters in fixed point phase 



Eqs. ( C29 1 together with Eq. (C27) are the final result of our path-integral analysis in the fixed-point phase. 



These equations determine the relevant or der p arameters Xi Q an d P self-consistently. We notice the high degree of 
nonlinearity due to the logarithmic term in (C27). In absence of this term (i.e. for =0) the resulting equations are 
linear and the Gaussian integrals in ( C29 ) can be carried out and the resulting equations can be simplied further, see 
[51 1351 [56] ) for details. In the presence of memory-loss (r -1 > 0) this is not possible however, and we have to approach 
the self-consistency problem numerically. We here restrict the analysis to the case Y < 0, when a positive solution 
of (C27) is found for all values of z. Numerically solving Eq. (C27) gives x{z) with an iterative Newton- Raphson 
procedure then allows us to determine the order parameters %, q, p 48J. Once these order parameters ar e det ermined 
the distribution of the components of the strategy vectors can be obtained from solving the above Eq. ( C27 ) 



More precisely one has 



_1 lnx(z) + Yxx(z) + ^fqz — p = 0. 



P{x) = J dz -^- S ( x ~ x ( z )) 



(Dl) 



for the distribution of fixed points of the effective process. Recalling that degrees of freedom in the path-integral 
analysis have been obtained from the original strategy components by a re-scaling with a factor of N (^ i Xi = N 
instead of Xi = 1), an analytical prediction for the distribution of strategy components of the original problem at 
a large but finite value of N can be obtained using Eq. (Dl ), and upon undoing this re-scaling. Results are shown in 
Fig. [4] of this Supplementary Information (left-hand panel). As seen in the figure the analytical predictions for this 
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FIG. 4: Test of theoretical predictions for the stable phase against simulations. The left-hand panel shows the distribution 
of components Xi of mixed strategies at the fixed point (F = —0.5). Solid lines are theoretical predictions, noisy lines from 
simulations. The right-hand panel shows the entropy S of the fixed point strategies of players. Symbols are from simulations 
(at N = 100 strategies per player, simulations run for 7,500 time steps, with measurements starting after 5,000 time steps). 
Averages over 100 different payoff matrices are taken. Solid lines are from the theory, hence only shown in the stable phase 
in the right-hand panel. Agreement with simulations is good, except for small deviations near the onset of instability. We 
attribute these to finite-size and equilibration effects. All data in this figure is taken at = 0.01. 



highly non-trivial and non-Gaussian distribution agree rather well with results from direct simulations of the original 
learning dynamics. 



We can also determine the entropy of a typical mixed strategy of a system at finite N at the fixed point as follows. 
Given the normalisation Yli x i ~ N we define S to be the entropy of the mixed strategy vector (x\/N, . . . , xjv/AT), 
i.e. 
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where 



denotes an average over z, i.e. 
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j^jof this Supplementary 



) , = f dz ■ ■ ■ e ,— . Results are shown in Fig. 

z J v 2tt 

Information (right panel), and again theoretical predictions and direct measurements from simulations agree very 
well. We note that mixed strategies concentrate on the centre of strategy space the for a/ (3 — » oo, i.e. for very quick 
memory loss. In this case one has xi s» 1 for all i (recall the normalisation xt — N), i.e. 
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As a final remark we point out that Eqs. (C29) and Eq. (C27) are valid only in the fixed point phase, as the 



assumption of a fixed point was explicitly made in deriving these relations. We are therefore only able to predict the 
statistics of the solution in the stable fixed point phase. The solution of the effective dynamics below the transition, in 
the chaotic regime, is a formidable task. No promising approaches are available, similar to lack of analytical handles 
for example on the 'turbulent' so-called non-ergodic phase of the minority game |33j . 



b. Onset of instability 



The validity of the analytical predictions for the onset of instability (boundary of the chaotic phase) has already been 
successfully confirmed in simulations in Fig. 2 of the main paper, where we have measured the expected dimension 
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FIG. 5: Test of theoretical predictions for the stability diagram. Solid line shows the onset of instability as predicted by the 
theory (see Eq. ( |C40[ )). Markers show results from simulations (see text for details). All data in this figure is taken at /? = 0.01. 



of the dynamical attractors in parameter space. These simulations are time-consuming and were therefore limited to 
systems of dimension 2 (AT — 1) = 98. In order to provide a more precise verification we have determined the onset of 
instability in larger systems in Fig. [5] of this Supplementary Material. The numerical data is here obtained as follows: 

1. For a fixed value of r generate M samples of the payoff bi-matrix. 

2. For these M realisations of the game, run the dynamics at large a//? and, for each sample determine whether 
or not it reaches a stable fixed point. 

3. If the majority of the M samples converges to a fixed point, lower the value of a//3 and repeat step 2 until more 
than half of the samples no longer converge. 

4. Record this value of a//? as the onset of instability, and proceed to a new value of T in 1. 

In the simulations of Fig. [5] we have used M = 10 samples. A given run is considered to reach a fixed point if 
both (i) all eigenvalues of the Jacobian at a final time T are within the unit circle and (ii) the total fluctuations 



3/TEL 2 /3T^W 2 -f3/TE^ 



Xi(t) 



are less that a pre-defined threshold z?. In our simulations we 



have used T = 15,000 and $ = 10~ 5 . If these criteria are not fullfilled the run is considered not to converge. We 
cannot entirely exclude to identify runs as non-convergent, when in fact they do converge on time scales larger than 
T. In this sense we can not exclude a potential over-estimation of the value ct/j3 at which the instability sets in 
in the numerical results presented in Fig. [5| The agreement with the theoretical predictions is very good however. 
Small deviations can be attributed to the effect just discussed, and to the fact that the theoretical prediction of the 
instability line is obtained for the continuous-time dynamics, whereas simulations are carried out for the discrete-time 
map at j3 = 0.01. Additionally there may be potential finite size effects. 



2. Estimation of the attractor dimension 

The Liapunov spectrum of the attractors are determined using a procedure similar to that described in |38) . 
Measurements are started after some equilibration time (teq = 150,000 iterations), after which we run a linearised 
map 



z(t + 1) = J, 



x(t),y(t) 



z(i), 
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parallel to the simulation of the original system, with L = 2(N — 1) degrees of freedom. The L x L matrix J x (i), y (t) 
is the Jacobian of the full non-linear system. We run L copies of the linearized dynamics, z^ x \ . . . , started from 
the L unit vectors. We then regularly perform a stabilized Gram-Schmidt procedure, and obtain estimates of the 
Liapunov exponents [38] . From these estimates one then calculates the Kaplan- Yorke dimension as 

D=i-&i A i, (D5) 

where the Liapunov exponents are ordered as Ai > A2 > • • • > A^, and where j is the largest integer such that 
Ai + • • • + Xj > [38, 39 . The estimates of the attractor dimension may fluctuate as the simulation run continues 
after equilibration, and as the attractor is sampled. In practice we find that the measured dimension tends to converge 
in most runs as the duration of the simulation increases. In our simulations we consider the attractor dimension in 
a given run as converged when the difference between the maximum and minimum estimate of the dimension in a 
time window of 20, 000 iteration steps deviate by less than 5% from each other. The dimension reported is then the 
average over that time window. In other words, simulations are first run for 150, 000 steps to equilibrate, then at 
least 20, 000 iterations additional are performed during which measurements are taken. Subsequently the simulation 
is extended (up to at most 10 6 iterations) until the convergence criterion is met. In practice we find that most samples 
have converged at 10 6 iterations or earlier, when we terminate our simulation. Examples of such measurements are 
shown in Fig. [6] of this Supplementary Information, the data shown corresponds to the attractors shown in Fig. 1 
of the main paper. Samples that have not converged on this time scale are ignored in our analysis, and have been 
disregarded when compiling the data for Fig. 2 of the main paper. We here find that only a small fraction of samples 
converges when the attractor dimension is very high. 



3. Return distribution 



The time series of 'returns' in Fig. 3 of the main manuscript shows the changes of total payoff to the two players. 
Specifically, we measure 

HtotW = E E + yi(t)bijXj(t)} (D6) 

i 3 

at each time step t in the equilibrated regime, and then plot ITtot (^) — ^-ioii^ — -0 m Fig. 3 of the main paper. The 
corresponding distribution of returns is shown in Fig. [7] of this SI, and shows exponential tails. 
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microscopic constraint. This has been omitted here to reduce the overall complexity of the calculation. Similar methods 
have been used e.g. in 
The integrals in Eq. 

this procedure. Our results are therefore numerical estimates of the actual solution. 



(C29 



are evaluated numerically, and the integration range necessarily needs to be truncated during 



